Section 3 Local polynomial filters
In this section we detail the filters that arise from fitting a local polynomial to our time series, as described by Proietti and Luati (2008). Local polynomial filters encompass classical filters like Henderson and Musgrave filters (see sections 3.1.1 and 3.2.2).
We assume that our time series \(y_t\) can be decomposed as \[ y_t=\mu_t+\varepsilon_t \] where \(\mu_t\) is the signal (trend) and \(\varepsilon_{t}\overset{i.i.d}{\sim}\mathcal{N}(0,\sigma^{2})\) is the noise9. We assume that \(\mu_t\) can be locally approximated by a polynomial of degree \(d\) of the time \(t\) between \(y_t\) and the neighboring observations \(\left(y_{t+j}\right)_{j\in\left\llbracket -h,h\right\rrbracket}\). Then \(\mu_t\simeq m_{t}\) with: \[ \forall j\in\left\llbracket -h,h\right\rrbracket :\: y_{t+j}=m_{t+j}+\varepsilon_{t+j},\quad m_{t+j}=\sum_{i=0}^{d}\beta_{i}j^{i} \] This signal extraction problem is then equivalent to the estimation of \(m_t=\beta_0\). In matrix notation we can write: \[ \underbrace{\begin{pmatrix}y_{t-h}\\ y_{t-(h-1)}\\ \vdots\\ y_{t}\\ \vdots\\ y_{t+(h-1)}\\ y_{t+h} \end{pmatrix}}_{y}=\underbrace{\begin{pmatrix}1 & -h & h^{2} & \cdots & (-h)^{d}\\ 1 & -(h-1) & (h-1)^{2} & \cdots & (-(h-1))^{d}\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & h-1 & (h-1)^{2} & \cdots & (h-1)^{d}\\ 1 & h & h^{2} & \cdots & h^{d} \end{pmatrix}}_{X}\underbrace{\begin{pmatrix}\beta_{0}\\ \beta_{1}\\ \vdots\\ \vdots\\ \vdots\\ \vdots\\ \beta_{d} \end{pmatrix}}_{\beta}+\underbrace{\begin{pmatrix}\varepsilon_{t-h}\\ \varepsilon_{t-(h-1)}\\ \vdots\\ \varepsilon_{t}\\ \vdots\\ \varepsilon_{t+(h-1)}\\ \varepsilon_{t+h} \end{pmatrix}}_{\varepsilon} \] Two parameters are crucial in determining the accuracy of the approximation:
the degree \(d\) of the polynomial;
the number of neighbors \(H=2h+1\) (or the bandwidth \(h\)).
In order to estimate \(\beta\) we need \(H\geq d+1\) and the estimation is done by the weighted least squares (WLS), which consists of minimizing the following objective function: \[ S(\hat{\beta}_{0},\dots,\hat{\beta}_{d})=\sum_{j=-h}^{h}\kappa_{j}(y_{t+j}-\hat{\beta}_{0}-\hat{\beta}_{1}j-\dots-\hat{\beta}_{d}j^{d})^{2} \] where \(\kappa_j\) is a set of weights called kernel. We have \(\kappa_j\geq 0:\kappa_{-j}=\kappa_j\), and with \(K=diag(\kappa_{-h},\dots,\kappa_{h})\), the estimate of \(\beta\) can be written as \(\hat{\beta}=(X'KX)^{1}X'Ky\). With \(e_{1}=\begin{pmatrix}1&0&\cdots&0\end{pmatrix}'\), the estimate of the trend is: \[ \hat{m}_{t}=e_{1}\hat{\beta}=w'y=\sum_{j=-h}^{h}w_{j}y_{t-j}\text{ with }w=KX(X'KX)^{-1}e_{1} \] To conclude, the estimate of the trend \(\hat{m}_{t}\) can be obtained applying the symmetric filter \(w\) to \(y_t\)10. Moreover, \(X'w=e_{1}\) so: \[ \sum_{j=-h}^{h}w_{j}=1,\quad\forall r\in\left\llbracket 1,d\right\rrbracket :\sum_{j=-h}^{h}j^{r}w_{j}=0 \] Hence, the filter \(w\) preserve deterministic polynomial of order \(d\).
3.1 Different kernels
In signal extraction, observations are generally weighted according to their distance from time \(t\): this is the role of the kernel function. In the discrete case, a kernel function is a set of weights \(\kappa_j\), \(j=0,\pm1,\dots,\pm h\) with \(\kappa_j \geq0\) and \(\kappa_j=\kappa_{-j}\). An important class of kernels is the Beta kernels. In the discrete case, up to a proportional factor (so that \(\sum_{j=-h}^h\kappa_j=1\)): \[ \kappa_j = \left( 1- \left\lvert \frac j {h+1} \right\lvert^r \right)^s \] with \(r>0\), \(s\geq 0\). It encompasses all kernels used in this report, except Henderson, trapezoidal and gaussian kernel.The following kernels are considered in this report:
\(r=1,s=0\) uniform kernel: \[\kappa_j^U=1\]
\(r=s=1\) triangle kernel: \[\kappa_j^T=\left( 1- \left\lvert \frac j {h+1} \right\lvert \right)\]
\(r=2,s=1\) Epanechnikov (or Parabolic) kernel: \[\kappa_j^E=\left( 1- \left\lvert \frac j {h+1} \right\lvert^2 \right)\]
\(r=s=2\) biweight kernel: \[\kappa_j^{BW}=\left( 1- \left\lvert \frac j {h+1} \right\lvert^2 \right)^2\]
\(r = 2, s = 3\) triweight kernel: \[\kappa_j^{TW}=\left( 1- \left\lvert \frac j {h+1} \right\lvert^2 \right)^3\]
\(r = s = 3\) tricube kernel: \[\kappa_j^{TC}=\left( 1- \left\lvert \frac j {h+1} \right\lvert^3 \right)^3\]
Henderson kernel (see section 3.1.1 for more details): \[ \kappa_{j}=\left[1-\frac{j^2}{(h+1)^2}\right] \left[1-\frac{j^2}{(h+2)^2}\right] \left[1-\frac{j^2}{(h+3)^2}\right] \]
Trapezoidal kernel: \[ \kappa_j^{TP}= \begin{cases} \frac{1}{3(2h-1)} & \text{ if }j=\pm h \\ \frac{2}{3(2h-1)} & \text{ if }j=\pm (h-1)\\ \frac{1}{2h-1}& \text{ otherwise} \end{cases} \]
Gaussian kernel11: \[ \kappa_j^G=\exp\left( -\frac{ j^2 }{ 2\sigma^2h^2 }\right) \]
Henderson, trapezoidal and gaussian kernel are very specific:
The Henderson and trapezoidal kernel functions change with the bandwidth (the other kernel only depend on the ratio \(j/h+1\)).
Other definitions of the trapezoidal and gaussian kernel can be used. The trapezoidal kernel is here considered because it corresponds to the filter used to extract the seasonal component in the X-12ARIMA algorithm. Therefore it is never used to extract trend-cycle component.